Introduction

In Carola Lilienthal's talk about architecture and technical debt at Herbstcampus 2017, I was reminded that I wanted to implement some of the examples of her book "Long-lived software systems" (available only in German) with jQAssistant. Especially the visualization of the dependencies between different business domains seems like a great starting point to try out some stuff:

The green connections between the modules show the downward dependencies to other modules and the red one the upward dependencies. This visualization can help you if you want to further modularize your system towards your business or subdomains or to identify unwanted dependencies between modules.

At the same time, I started the Java Type Dependency Analysis and realized that there it is only a smart step to analyze dependencies between business domains. What's missing is the information which type belong to which business domain. We'll find out now!

A simple case study

Once, I've developed an party planning application called DropOver (that didn't go live, but that's another story). We wrote that web application in Java and paid especially attention to structuring the code along the business' subdomain "partying". This led to this package structure that resembles the main parts of the application:

The application's main entry point is a site for a party including location, time, the site's creator and so on. A user can comment on a site as well as add some specific widgets like todo lists, scheduling or files upload and also gets notified by the mail feature. And there is a special package framework were all the cross-cutting concerns are placed like the dependency injection configuration or common, technical software elements.

The main point to take away here is that thanks to the alignment of the package structure along the business' subdomain it's easy to determine the business domain for a software entity. It's the 3rd position in the Java package name:

at.dropover.<subdomain>.

This information item can easily be used to retrieve the information about the subdomain.

Software from a graph's perspective

I've built the web application, scanned the software artifact (a standard JAR file that we export for integration testing purposes) with jQAssistant command line tool (with jqassistant.sh scan -f dropover-classesjar in this case) and started the server (with jqassistant.sh server). Taking a look in the accompanied Neo4j Browser, we can see the graph that jQAssistant stored in Neo4j. E. g. we can display the relationship between the JAR file and the contained Java types:

In the following, I set up the connection between my Python glue code and the Neo4j database. The query executed lists simply all Java types of the application (respectivley the JAR artifact). As mentioned above, we can also get the information about the subdomain derived from the package name:


In [153]:
import py2neo
import pandas as pd

query="""
MATCH
    (:Jar:Archive)-[:CONTAINS]->(type:Type)
RETURN
    type.fqn AS type, SPLIT(type.fqn, ".")[2] AS subdomain
"""

graph = py2neo.Graph()
subdomaininfo = pd.DataFrame(graph.run(query).data())
subdomaininfo.head()


Out[153]:
subdomain type
0 scheduling at.dropover.scheduling.interactor.GetSchedulings
1 scheduling at.dropover.scheduling.interactor.validation.S...
2 site at.dropover.site.entity.Site
3 files at.dropover.files.boundary.UploadFileRequestModel
4 scheduling at.dropover.scheduling.entity.gateway.inmemory...

The request returns all the corresponding subdomain for each type. Combined with the approach in Java Type Dependency Analysis, we can now visualize the dependencies between the various subdomains:


In [161]:
import json

query="""
MATCH
    (:Jar:Archive)-[:CONTAINS]->
    (type:Type)-[:DEPENDS_ON]->(directDependency:Type)
    <-[:CONTAINS]-(:Jar:Archive)
RETURN 
    SPLIT(type.fqn, ".")[2] AS name, 
    COLLECT(DISTINCT SPLIT(directDependency.fqn, ".")[2]) AS imports
"""

graph = py2neo.Graph()
json_data = graph.run(query).data()

with open ( "vis/flare-imports.json", mode='w') as json_file:
    json_file.write(json.dumps(json_data, indent=3))

json_data[:2]


Out[161]:
[{'imports': ['comment', 'framework', 'creator', 'site'], 'name': 'comment'},
 {'imports': ['site', 'mail', 'framework', 'creator'], 'name': 'mail'}]

In the output, we see the dependencies between the various subdomains

I've altered the visualization just a little bit so that we can see bidirectional dependencies as well. Those are green and red at the same time and appear more dominant than unidirectional dependencies.

From the visualization above, we can see that the creator subdomain is used by Java source code from the subdomains comment, site, scheduling, mail and framework. The first four make perfectly sense because if you create one of those content types in the application, they are created by some person (they are "personalized" content). Whereas todo and files are user agnostic content types and thus don't have any dependencies on creator (that's a tricky situation in retrospect). What's could look like a mess are the dependencies from and to framework. In the pseudo subdomain framework are some base classes for all the data objects that get persistent in a data store. That explains the outbound dependency of creator. The inbound dependencies from framework to creator are needed for the central dependency injection configuration of the application.

Where it get's interesting is the following visualization of the dependencies of the subdomain site:


In [152]:
query="""
MATCH
    (type:Type)
WHERE
    type.fqn STARTS WITH "at.dropover"
WITH DISTINCT type
MATCH
    (d1:Domain:Business)<-[:BELONGS_TO]-(type:Type),
    (type)-[:DEPENDS_ON*0..1]->(directDependency:Type),
    (directDependency)-[:BELONGS_TO]->(d2:Business:Domain)
RETURN d1.name as name, COLLECT(DISTINCT d2.name) as imports
"""
json_data = graph.run(query).data()
import json
with open ( "vis/flare-imports.json", mode='w') as json_file:
    json_file.write(json.dumps(json_data, indent=3))

json_data[:2]


Out[152]:
[]

In [111]:
query="""
MATCH
    (type:Type)
WHERE
    type.fqn STARTS WITH "at.dropover"
WITH DISTINCT type
MATCH
    (d1:Domain:Business)<-[:BELONGS_TO]-(type:Type),
    (type)-[r:DEPENDS_ON*0..1]->(directDependency:Type),
    (directDependency)-[:BELONGS_TO]->(d2:Business:Domain)
RETURN d1.name as name, d2.name, COUNT(r) as number
"""
json_data = graph.run(query).data()
df = pd.DataFrame(json_data)
data = df.to_dict(orient='split')['data']
with open ( "vis/chord_data.json", mode='w') as json_file:
    json_file.write(json.dumps(data, indent=3))
data[:5]


Out[111]:
[['files', 'framework', 4],
 ['site', 'files', 11],
 ['site', 'mail', 4],
 ['creator', 'framework', 4],
 ['files', 'site', 1]]

A more sophisticated use case

Even if there aren't any package naming conventions, you can identify some structure for example in class names or in your inheritance hierarchy that points you towards your subdomains in the code (if that isn't possible as well: I wrote my Master's thesis about mining cohesive concepts from source code via text mining, so you could use that as well :-D . And at a last resort, you have to do the mapping manually...).

Let's see how this could work by mapping business subdomains to the class names of the Spring PetClinic project.

We also have a list of all types in our application:


In [37]:
import py2neo
import pandas as pd

query="""
MATCH
    (:Project)-[:CONTAINS]->(artifact:Artifact)-[:CONTAINS]->(type:Type)
RETURN type.fqn as fqn, type.name as name
"""

graph = py2neo.Graph()
subdomaininfo = pd.DataFrame(graph.run(query).data())
subdomaininfo.head()


Out[37]:
fqn name
0 org.springframework.samples.petclinic.web.Visi... VisitControllerTests
1 org.springframework.samples.petclinic.model.Va... ValidatorTests
2 org.springframework.samples.petclinic.web.Cras... CrashControllerTests
3 org.springframework.samples.petclinic.service.... AbstractClinicServiceTests
4 org.springframework.samples.petclinic.web.PetT... PetTypeFormatterTests$1

First, let's assume that we have some subdomains of our business domain we know about:


In [38]:
subdomains = ['Owner', 'Pet', 'Visit', 'Vet', 'Specialty', 'Clinic']

In [52]:
def determine_subdomain(name):
    for feature in subdomains:
        if feature in name:
            return feature

    return "Framework"

In [53]:
subdomaininfo['subdomain'] = subdomaininfo['name'].apply(determine_subdomain)
subdomaininfo.head()


Out[53]:
fqn name subdomain
0 org.springframework.samples.petclinic.web.Visi... VisitControllerTests Visit
1 org.springframework.samples.petclinic.model.Va... ValidatorTests Framework
2 org.springframework.samples.petclinic.web.Cras... CrashControllerTests Framework
3 org.springframework.samples.petclinic.service.... AbstractClinicServiceTests Clinic
4 org.springframework.samples.petclinic.web.PetT... PetTypeFormatterTests$1 Pet

In [59]:
query="""
UNWIND {subdomaininfo} as info
MERGE (subdomain:Domain:Business { name: info.subdomain })
WITH info, subdomain
MATCH (n:Type { fqn: info.fqn})
MERGE (n)-[:BELONGS_TO]->(subdomain)
RETURN n.fqn as type_fqn, subdomain.name as subdomain
"""

result = graph.run(query, subdomaininfo=subdomaininfo.to_dict(orient='records')).data()
pd.DataFrame(result).head()


Out[59]:
subdomain type_fqn
0 Visit org.springframework.samples.petclinic.web.Visi...
1 Framework org.springframework.samples.petclinic.model.Va...
2 Framework org.springframework.samples.petclinic.web.Cras...
3 Clinic org.springframework.samples.petclinic.service....
4 Pet org.springframework.samples.petclinic.web.PetT...

In [98]:
query="""
MATCH
    (:Project)-[:CONTAINS]->(artifact:Artifact)-[:CONTAINS]->(type:Type)
WHERE
    // we don't want thgo analyze test artifacts
    NOT artifact.type = "test-jar" 
WITH DISTINCT type, artifact
MATCH
    (d1:Domain:Business)<-[:BELONGS_TO]-(type:Type),
    (type)-[r:DEPENDS_ON*0..1]->(directDependency:Type),
    (directDependency)-[:BELONGS_TO]->(d2:Business:Domain),
    (directDependency)<-[:CONTAINS]-(artifact)
RETURN d1.name as name, d2.name, COUNT(r) as number
"""

json_data = graph.run(query).data()
df = pd.DataFrame(json_data)
df.to_dict(orient='split')


Out[98]:
{'columns': ['d2.name', 'name', 'number'],
 'data': [['Framework', 'Visit', 1],
  ['Visit', 'Clinic', 3],
  ['Framework', 'Owner', 2],
  ['Pet', 'Visit', 5],
  ['Specialty', 'Vet', 2],
  ['Visit', 'Visit', 10],
  ['Owner', 'Owner', 8],
  ['Pet', 'Owner', 4],
  ['Clinic', 'Vet', 1],
  ['Clinic', 'Pet', 2],
  ['Framework', 'Framework', 3],
  ['Framework', 'Vet', 2],
  ['Vet', 'Vet', 11],
  ['Clinic', 'Clinic', 1],
  ['Framework', 'Pet', 3],
  ['Clinic', 'Owner', 1],
  ['Owner', 'Pet', 4],
  ['Clinic', 'Visit', 1],
  ['Pet', 'Pet', 21],
  ['Framework', 'Specialty', 1],
  ['Visit', 'Pet', 4],
  ['Owner', 'Clinic', 3],
  ['Vet', 'Clinic', 3],
  ['Pet', 'Clinic', 5]],
 'index': [0,
  1,
  2,
  3,
  4,
  5,
  6,
  7,
  8,
  9,
  10,
  11,
  12,
  13,
  14,
  15,
  16,
  17,
  18,
  19,
  20,
  21,
  22,
  23]}

Like in the simple example, the graph looks now like this:


In [14]:
import pandas as pd
pd.DataFrame(json_data).head()


Out[14]:
n.fqn s.name t.name u.name
0 org.springframework.samples.petclinic.Petclini... Pet petclinic Initializer
1 org.springframework.samples.petclinic.reposito... Pet jpa Repository
2 org.springframework.samples.petclinic.web.Visi... Visit web Controller
3 org.springframework.samples.petclinic.reposito... Vet jdbc Repository
4 org.springframework.samples.petclinic.web.Cras... Framework web Controller

In [81]:
query="""
MATCH
    (:Project)-[:CONTAINS]->(artifact:Artifact)-[:CONTAINS]->(type:Type)
WHERE
    // we don't want to analyze test artifacts
    NOT artifact.type = "test-jar" 
WITH DISTINCT type, artifact
MATCH
    (d1:Domain:Business)<-[:BELONGS_TO]-(type:Type),
    (type)-[:DEPENDS_ON*0..1]->(directDependency:Type),
    (directDependency)-[:BELONGS_TO]->(d2:Business:Domain),
    (directDependency)<-[:CONTAINS]-(artifact)
RETURN d1.name as name, COLLECT(DISTINCT d2.name) as imports
"""
json_data = graph.run(query).data()

import json
with open ( "vis/flare-imports.json", mode='w') as json_file:
    json_file.write(json.dumps(json_data, indent=3))
    
json_data


Out[81]:
[]

In [113]:
query="""
MATCH
    (:Project)-[:CONTAINS]->(artifact:Artifact)-[:CONTAINS]->(type:Type)
WHERE
    // we don't want to analyze test artifacts
    NOT artifact.type = "test-jar" 
WITH DISTINCT type, artifact
MATCH
    (d1:Domain:Business)<-[:BELONGS_TO]-(type:Type),
    (type)-[r:DEPENDS_ON*0..1]->(directDependency:Type),
    (directDependency)-[:BELONGS_TO]->(d2:Business:Domain),
    (directDependency)<-[:CONTAINS]-(artifact)
RETURN d1.name as name, d2.name, COUNT(r) as number
"""


json_data = graph.run(query).data()
df = pd.DataFrame(json_data)
data = df.to_dict(orient='split')['data']
with open ( "vis/chord_data.json", mode='w') as json_file:
    json_file.write(json.dumps(data, indent=3))
data[:5]


Out[113]:
[['Framework', 'Visit', 1],
 ['Visit', 'Clinic', 3],
 ['Framework', 'Owner', 2],
 ['Pet', 'Visit', 5],
 ['Specialty', 'Vet', 2]]

Bonus Dependencies between subdomains


In [2]:
query="""
MATCH
    (t1:Type)-[:BELONGS_TO]->(s1:Subdomain),
    (t2:Type)-[:BELONGS_TO]->(s2:Subdomain),
    (t1)-[:DEPENDS_ON]->(t2)
WHERE s1.name <> s2.name
MERGE (s1)-[:DEPENDS_ON]->(s2)
RETURN s1.name, s2.name
"""

pd.DataFrame(graph.run(query).data()).head()


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-2-2e0dfafd0a6a> in <module>()
      9 """
     10 
---> 11 pd.DataFrame(graph.run(query).data()).head()

NameError: name 'pd' is not defined

Additionaly, we get a nice visualization of the dependencies between the various business subdomains that can also be visualized with D3 as described in Analyze Dependencies between Business Subdomains.